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Abstract 

The exact average complexity analysis of the basic sphere decoder for general space-time codes applied to multiple- 
input multiple-output (MIMO) wireless channel is known to be difficult. In this work, we shed the light on the computa- 
tional complexity of sphere decoding for the quasi-static, LAttice Space-Time (LAST) coded MIMO channel. Specifically, 
we drive an upper bound of the tail distribution of the decoder's computational complexity. We show that, when the 
computational complexity exceeds a certain limit, this upper bound becomes dominated by the outage probability achieved 
y—{ by LAST coding and sphere decoding schemes. We then calculate the minimum average computational complexity that 
is required by the decoder to achieve near optimal performance in terms of the system parameters. Our results indicate 
that there exists a cut-off rate (multiplexing gain) for which the average complexity remains bounded. 

^ L Introduction 

1—5 

Since its introduction to multiple-input multiple-output (MIMO) wireless communication systems, the sphere decoder 
as become an attractive efficient implementation of the maximum-likelihood (ML) decoder, especially for small signal 
^"dimensions and/or moderate to large SNRs. Such a decoder allows for significant reduction in (average) decoding 
^^omplexity as opposed to the ML decoder without sacrificing performance. In general, the sphere decoder ifSl- lfTTI is 
j/§ommonly used in communication systems that can be well-described by the following (real) linear Gaussian vector 
, Q:f iannel model 

y = Mx + e, (1) 

cn 

I;;^here x G M"^ is the input to the channel, y G is the output of the channel, e G M" is the additive Gaussian noise 
^j'ector with entries that are independent identically distributed, zero-mean Gaussian random variables with variance 1/2, 
.i_jand M G ]^"x»" is a matrix representing the channel linear mapping. 

1-H The input-output relation describing the channel that is given in ([T]) allows for the use of lattice theory 1 1 1 to analyze 
(T^any digital communication systems. In this paper, we assume that a; is a codeword selected from a lattice code. Let 
C?Ve = A(G) = {x = Gz : z £ Z""} be a lattice in where G is an m x m full-rank lattice generator matrix. The 
^^oronoi cell, VxiG), that corresponds to the lattice point a; G Ac is the set of points in closest to x than to any 
K^ther point A G Ac, with volume that is given by Vc = Vol(Va;(G)) = ^y det{G'^G). An m-dimensional lattice code 
•^{Ac,Uo,Tl) is the finite subset of the lattice translate Ac + Uq inside the shaping region TZ, i.e., C = {Ac + Uq} n TZ, 
r^here TZ is a bounded measurable regiorj^of M"^. 

Space-time codes based on lattices have been used in MIMO channels due to their low encoding complexity (e.g., 
nested or Voronoi codes [I]) and the capability of achieving excellent eiTor performance (see |[T4ll ). Another important 
aspect of using lattice space-time (LAST) codes is that they can be decoded by a class of efficient decoders known 
as lattice decoders. These decoder algorithms reduce complexity by relaxing the code boundary constraint and find the 
point of the underlying (infinite) lattice closest to the received point, i.e., 

i = arg min lly — Ma;|P. (2) 

It is well-known that sphere decoding based on Fincke-Pohst and Schnorr-Euchner enumerations are efficient strategies 
to solve (|2]) and have been widely used for signal detection in MIMO systems of small dimensions (see 131 and references 



'in this paper, we consider a shaping region TZ that corresponds to the Voronoi cell Vs of a sublattice As of Ac, i.e., As C A^. The generated 
codes are called nested (or Voronoi) lattice codes (see jl5| for more details). 



therein). In this work, and for the sake of simpUcity, we consider the Fincke-Pohst sphere decoder without radius reduction 
and restarting (see [3] for more details). 

A. Previous Work 

Up to date, most of the work on the complexity of sphere decoding focused on characterizing the mean and the variance 
of the decoder's complexity, particularly for the uncoded MIMO channel (e.g., V-BLAST) L8J-[10j. Seethaler et. al. 
ifTTI considered the derivation of the computational distribution of the sphere decoder for the M x N uncoded MIMO 
channel. Characterizing and understanding the complexity distribution is important, especially when the sphere decoder 
is used under practically relevant runtime constraints. The computational tail distribution is defined as Pr(C > L), where 
C is the overall decoding complexity, and L is the distribution parameter. It has been shown in ifTTl that, as L — >• cx), 
the computational tail distribution follows a Pareto-type with tail exponent given by — M + 1, i.e., 

Pr(C>L) = L-(^-*^+i), L^oo. 

However, the main drawback of their work is that they consider the decoder's complexity analysis when the number 
of computations performed by the decoder increases without bound. In other words, although the behaviour of the tail 
distribution is characterized, they do not specify a value of L to indicate when the computations become excessive. In 
fact, when the decoder is used under practically runtime constraint, it is sometimes desirable to allow the decoder to 
terminate the search once a limit is exceeded and declare an error. 

Achieving higher diversity and multiplexing gains require incorporating error control coding (across antenna and 
time) at the transmitter. Several works have considered the computational complexity analysis of optimal and sub- 
optimal decoders applied to LAST coded M x N MIMO channels |3|-[7|. A first attempt toward specifying the exact 
complexity required by the decoder to achieve the optimal diversity-multiplexing tradeoff (DMT) |13| of the quasi-static 
LAST coded MIMO channel was considered in ||6l. It was shown that the optimal tradeoff can be achieved using lattice 
reduction (LR) aided linear decoders at a worst-case complexity O(logp), where p is the SNR. This corresponds to 
a linear increase in complexity as a function of the code rate R at the high SNR regime, where R = r log p with 
r < min{M, N} referred to as the multiplexing gain of the coding scheme. However, this very low decoding complexity 
comes at the expense of a large gap from the sphere decoder's error performance. In order to close this gap, lattice 
sequential decoding algorithms [4] are considered efficient decoders that achieve good error performance with much lower 
decoding complexity. In |5], we have analysed in details the computational tail distribution and the average complexity 
of the lattice sequential decoder. Specifically, we have shown that when the computational complexity exceeds a certain 
limit, the tail distribution becomes upper bounded by the outage probability achieved by LAST coding and sequential 
decoding schemes, i.e., 

Pr(C7 >L)< p-'^=-('^), L > Lo, 

where dont{r) is the DMT achieved by the underlying coding and the decoding schemes. This interesting result indicates 
that one may save on decoding complexity while still achieving near-outage performance by setting a time-out limit at 
the decoder so that when the computational complexity exceeds the limit, the decoder terminates the search and declare 
an error. In this work, and for the sake of completeness, we study the complexity behaviour of the optimal sphere decoder 
in the quasi-static MIMO channel. This would allow us to compare the complexity of the sphere decoder with other low 
complexity decoders and see whether it is worth sacrificing performance for complexity achieved by such decoders. 

The work in |7] considers sphere decoding algorithms that describe exact ML decoding. Such algorithms take into 
consideration the coding boundary constraint to achieve the ML performance, while in this paper we consider sphere 
decoding algorithms that perform lattice decoding. To be specific, it has been shown in |7| that the total number of 
computations required by the sphere decoder to achieve a vanishing gap from the ML performance is given asymptotically 
(at high SNR) by p'^^'^\ where c(r) = Tr{l - r/M), VO < r < M, has been defined as the sphere decoder's SNR 
complexity exponent. Compared to the exhaustive ML decoder's complexity p'^'^ , there is a (1 — r/M) reduction factor 
in the complexity SNR exponent achieved by the sphere decoder without sacrificing the performance. However, two 
important aspects about the complexity behaviour of the sphere decoder have not been considered in ||7l. First, at 
multiplexing gain r = 0, i.e., at fixed coding rates R, one cannot realize the complexity saving advantage achieved by 
the sphere decoder using the above definition of the computational complexity. This is very important, since most of 
the experiments on the complexity of various sphere decoding algorithms are performed using channel codes with fixed 



rates. Moreover, the comparisons between several decoders applied to the outage-limited MIMO channel are usually 
performed experimentally through their corresponding average decoding complexity, a topic that was not considered in 

m. 

B. Main Contribution 

The main contribution in this work focuses on the complexity tail distribution and the (exact) average computational 
complexity of the sphere decoder for the LAST coded MIMO channel. Specifically, we consider the analysis of the decoder 
when minimum mean-square error decision-feedback equalization (MMSE-DFE) is applied. We derive the asymptotic 
average complexity of the decoder in terms of the system parameters: the SNR p, the number of transmit antennas M, 
the number of receive antennas A^, and the codeword length T. For this type of decoding, we specify the required system 
parameters that are needed to achieve the corresponding DMT with fairly low decoding complexity. In general, it is 
shown that the sphere decoder has much lower asymptotic complexity than the exhaustive ML decoder. Moreover, we 
show that there exists a cut-off rate (multiplexing gain) for which the average complexity remains bounded. Simulation 
results are used to verify our theoretical analysis and claims. 

Throughout the paper, we use the following notation. The superscript denotes complex quantities, ^ denotes transpose, 
and ^ denotes Hermitian transpose. We refer to g{z) = z"- as lim^-^oo 5(-z)/log(z) = a, > and < are used similarly. For 
a bounded Jordan-measurable region TZ C M"*, V(Tl) denotes the volume of TZ. We denote S^{r) by the m-dimensional 
hypersphere of radius r centred at x with V{S^{r)) = (7rr^)™/^/r(m/2 + l), and Im denotes the mxm identity matrix. 
The notation v ~ AA(/i, K) indicates that w is a real Gaussian random vector with mean /x and covariance matrix K. 
The complement of a set A is denoted by A. 

II. LAST Coding and Lattice Decoding 

We consider a quasi-static, Rayleigh fading MIMO channel with M-transmit, -receive antennas, and no channel 
state information (CSI) at the transmitter and perfect CSI at the receiver. The complex base-band model of the received 
signal can be mathematically described by (for T channel uses) 

Y" = ^H^X" + W'', (3) 

where X'^ G ^MxT jj^g transmitted space-time code matrix, Y'^ G £^NxT jj^g received signal matrix, G C^^'^ 
is the noise matrix, H'^ G C^^^^ is the channel matrix, and p = SNR/Af is the normalized SNR at each receive 
antenna with respect to M. The elements of both the noise matrix and the channel fading gain matrix are assumed to 
be independent identically distributed (i.i.d.) zero mean circularly symmetric complex Gaussian random variables with 
variance = 1. 

An M X T space-time coding scheme is a full-dimensional LAttice Space-Time (LAST) code if its vectorized (real) 
codebook (corresponding to the channel model ([T])) is a lattice code with dimension m = 2MT. As discussed in [141, 
the design of space-time signals reduces to the construction of a codebook C C R^a/t ^j^j^ g^^g j.^jg ^ _ liog|C|, 
satisfying the input averaging power constraint 

I^Xlll^ll^ - 
' ' xec 



When lattice decoding is pre-processed by MMSE-DFE filtering the equivalent real model of the above channel can 
be easily shown to be given by ([T]) with M that satisfies (see ||T4| for more details) 

det (M"^M) = det(/M + p(-H'^)^^''^) (4) 



^Here, we perform the QR-decomposition on the augmented channel matrix 

^ = ( J ) ^ Qfl, 



where H is real-valued equivalent channel gain matrix, Q G ]j{"+™)x™ jj^g orthonormal columns, and R G jj^x™ \^ upper triangular with 
positive diagonal elements. If we let Q = HR^ the upper n x m part of Q, then the matrices F = and B = R are called the MMSE-DFE 
forward and backward filters, respectively. At the receiver, the received signal, y, is multiplied by the forward filter matrix F of the MMSE-DFE 
to get y = Bx + e. This is equivalent to Jib with M = B where B has the property that det(B^B) = [det [Im + p{H'')"H'')] (refer to [141 
for more details about this topic). 



Definition 1. Consider a family of LAST codes Cp for fixed M and T, obtained from lattices of a given dimension 
m = 2MT and indexed by their operating SNR p. The code Cp has rate R{p), average error probability Pe{p), and 
decoding computational complexity Pr(C > L) (averaged over the random channel matrix H^). The multiplexing gain, 
diversity order, and complexity tail exponent are defined respectively as 

R{p) -logPe(p) -logPr(C>L) 
r = iim , a = iim , r/ = lim . 

p^oo log p p^co log p p-s>oo log p 

It has been shown in lfT4l that LAST coding and lattice decoding can achieve rates up to 

Rlast{p,H^) = logdet(MTM)V2r = bgdet (Im + p{H'TH^) ■ (5) 

For the underlying quasi-static MIMO channel, it is well-known that the asymptotic error performance, Pe{p), of any 
coding and decoding schemes is dominated by the outage probability, Poat{p,R), i-c, Pe{p) = Pout{p,R)- For LAST 
coding and lattice decoding schemes, the outage probability is defined by 

Pontip,R) = Pr(i? > Rlast{p,H,)) = p-'^-M, (6) 

where d*^^{r) = (M — r){N — r), \/ r ^ [0, min{M, A^}), is defined as the optimal diversity-multiplexing tradeoff 
(DMT) achieved by the channel ifTSl . 

Define the outage event 0{p) = {H'^ : R{p) > Ri^as,t:{p,H'^)}, and denote the transmission rate R{p) = rlogp. Let 

< Ai < • • • < Aa/ be the ordered eigenvalues of {H'^)^H^, and define a = (ai, • • • , om), where Oi = — log Aj/ log p. 
As discussed in |[T3l . at high SNR, the non-negative values of a only contributes to the outage event. Therefore, the 
outage event can be expressed as 

0= |a e : - ai)+ < r| . (7) 

In what follows, we summarize the results derived in [14J. For the lattice decoding, there exists a sequence of full- 
dimensional LAST codes that achieves DMT 

dl^,{r) = {M-r){N-r), Vr e [0, min{M, AT}], (8) 

under the constraint T > M + N — 1 (see |[T4l for more details). 

III. Lattice Decoding via Sphere Decoding 

While ML decoding performs exhaustive search over all codewords c E C{Ac,'JZ), sphere decoding algorithms find 
the closest lattice point a; G Ac to the received signal y within a sphere radius Rg centred at the received signal. It is 
well-known that the sphere decoder allows for significant reduction in decoding complexity for small signal dimensions 
and average to large values of SNR. Depending on whether the sphere decoder incorporate the boundaries of the lattice 
code (i.e., TZ) into the search algorithm or not, one can achieve ML or near-ML performance. Here, we consider sphere 
decoding algorithms that describe lattice decoding, i.e., the class of decoding algorithms that do not take into account 
the shaping region TZ. 

In general, the sphere decoder, after QR decomposition of the channel-code matrix MG = QR, finds all integer lattice 
points z G Z'" that satisfy the sphere constraint 

m 

\\y'-Rz\\^ = ^\\y''l-Rkkz\f<Rl (9) 

k=l 

where y' = Q^y, Q is an orthogonal matrix, and i2 is an m x m upper triangular matrix with positive diagonal elements. 
Moreover, ■2^= i^k, • • • ,Z2, zi]^ denotes the last k components of the integer vector z, Rkk is the lower kxk part of the 
matrix R, y'l is the last k components of the vector y'. The structure of R allows one to perform backward sequential 
search from layer (dimension) m to layer 1. Several algorithms were developed to efficiently perform the search (see 
for example [3]). Once all points are listed, one can find the point that is closest in distance to y. 

It is more convenient to look at the sphere decoder as a search in a tree with m layers. The A;-th layer, where 

1 < k < m, contains nodes that correspond to the partial integer lattice points G Z^. In this case, one may define 



the computational complexity of the sphere decoder as the total number of nodes that have been visited (or extended) 
by the decoder during the search. 

Define the indicator function ipizi) by 

0(4) = ^' if-^^^^t^^ded; ^^^^ 
lO, otherwise. 

Then, the total number of partial integer lattice points Zi G found by the decoder at layer k can be expressed as 

E 



Ct= V 4>{zi). (11) 



Therefore, the total computational complexity of the sphere decoder C that is required to find the closest lattice point 
to the received signal is given by C = J2^=i ^k- 

A. Sphere Radius Selection 

The selection of the initial radius Rg at the beginning of the search is of crucial importance in the computational 
complexity analysis. Choosing a small value of Rg may result in finding no lattice points inside the sphere (i.e., (7^ = 
for some 1 < k < m). On the other hand, choosing a very large value of Rg results in finding too many lattice points 
inside the sphere that leads to very large computational complexity. As such, the sphere radius Rg must be chosen 
sufficiently large for the search sphere to contain at least one lattice point. 

Selecting Rg = rco^{MG), i.e., the covering radiu^ of the lattice generated by MG, guarantees the existing of at 
least one lattice point inside the sphere. Unfortunately, the computation of Tcov for a general lattice is very difficult. 
Another choice of Rg is the distance between the Babai estimate and the vector y. As mentioned in [8], although this 
choice guarantees the existence of at least one lattice point (the Babai estimate) inside the sphere, it not clear in general 
whether it leads to too many lattice points inside the sphere. 

In this work, we follow a different approach to find a sphere radius that guarantees the existing of at least one lattice 
point inside the sphere. This particular choice of the sphere radius is shown to simplify the analysis of deriving an upper 
bound on the decoder's computational complexity. The basic idea of this approach (as will be shown in the sequel) is 
to separate the typical noise events from the non-typical ones. This allows the separation of the "typical" lattice points 
(lattice points that are highly likely to be generated by the sphere decoder) from the atypical ones. 

B. The k-th Layer Complexity 

In this section, we would like to provide some insight about the computational complexity of the sphere decoder 
at the fc-th layer (more details can be found in fTP|). This may assist us in the derivation of the upper bound on the 
computational complexity distribution as will be shown in the sequel. 

As mentioned previously, the computational complexity of the sphere decoder at the /c-th layer is determined by the 
total number of partial lattice points z\ G iJ^ that satisfy the A;-th layer sphere constraint 

Wy'l - Rkkz\\\ < Rg. 

We assume that Rg is chosen sufficiently large enough so that at least one lattice point is found inside the sphere (details 
on how Rg is selected will be introduced next). It is clear that the computational complexity of the decoder depends on 
the distributions of y\ and Rkk- Since those two quantities are random, the computational complexity analysis of the 
sphere decoder is considered difficult. 

Most of the work on the complexity of sphere decoding (see |I21, ifTTI . lfT2l ). rely on approximating Ck by 

^ V{Sl{Rg)) 

^ det{Rl,RkkY/^' ^ ^ 

where the approximation becomes more accurate for sufficiently large Rg. It should not be so surprising that the k-th 
layer complexity of the sphere decoder is inversely proportional to the volume of the Voronoi region of the lattice 
generated by the partial upper triangular matrix Rkk- Since Rf^f^ is related to the channel matrix H'^, it is to be expected 
that the computational complexity depends critically on the channel conditions, i.e., depends on whether the channel is 
ill or well conditioned. We are now ready to establish our upper bound on the decoder's complexity tail distribution. 



^The covering radius rcov{G) of a lattice A(G) is tlie radius of tiie smallest sphere centred at the origin that contains Vo(G). 



IV. Computational Complexity: Tail Distribution in the High SNR Regime 

In this section, we consider a sphere radius = MT{1 + (^logp), where ^ > is chosen sufficiently large enough 
to ensure the existence of at least one lattice point inside the search sphere. The reason for that choice will become 
evident as we further analyze the complexity of the decoder. 

In this section, we are interested in finding an upper bound to the tail distribution of the decoder's computational 
complexity at high SNR. This is summarized in the following theorem: 

Theorem 1. The asymptotic computational complexity distribution of the MMSE-DFE sphere decoder in the M x N 
LAST coded MIMO channel with codeword length T, is upper bounded by 

Pr(C > L) < p-'^M, (13) 



under the condition that 



L > m + 



where the SNR exponent r](r) = (M — r){N — r) for T > N + M — 1. The matrix Rkk the lower k x k part of 
R = Q^MG. 

Proof: We follow the same approach that is commonly used to upper bound the decoding error probability in the 
quasi-static MIMO channel |[T3i . By separating the outage event from the non-outage event, we obtain: 

Pr(C > L) < Pr(a G O) + Pr(C > L,a £ O). (15) 

Let us concentrate on bounding the second term in the RHS of ( [TS] ). As mentioned in Section III.B, bounding this term 
is considered difficult in general. However, as will be shown in the sequel, the analysis may be simplified by separating 
the two events that correspond to the additive noise being located inside and outside a sphere of radius Rg. In this case. 



one can upper bound the second term in the RHS of (15 1 as follows: 

Pr(C > L\0) < Pr(C > L, \\ef < R^p) + Pr(||e|p > R^). (16) 



It is clear from the above bound that the value of the sphere radius Rg affects both terms in the RHS of ( |T6| ). It very 
important to note that Rs must be selected carefully, not only to ensure the existence of a lattice point inside the sphere, 
but to obtain a tail distribution that is upper bounded by the outage probability of the underlying decoding scheme. 
The intuition behind this will be discussed later in this section. Nevertheless, in order to appropriately select the sphere 
radius, we study the behaviour of some parameters that correspond to the channel-code lattice A(MG) when the channel 
is not in outage. 

First, for nested lattice codes we have that (see lITTl ) 

RT _ rT _ V{n) 



|C(A„7^)|=2«^ =p^ 



When the channel is not in outage, one can verify that the effective radius of the channel-code matrix, refr(MGj^ is 
asymptotically given by 



reff(MG) 



V{S^{1)) _ 



MT 



p-^^ det(M"^M)^/2l ^ MTpT, (17) 



with 7 = [X]j=i(l ~ c^i)^ ~ r]/2M > 0, when the channel is not in outage. 

It is clear from ( [T7| ) that, as p — )• oo the volume of the Voronoi region Vo(MG) as well as r,.j\-{MG) grow quickly 
with SNR as p'', where 7 > when the channel is not in outage. Hence, the sphere radius is required to increase with 
SNR as well in order to ensure the existing of at least one lattice point inside the decoder's search sphere. However, 
choosing Rs = r^siMG) = p'^ results into too many points inside the sphere. Therefore, Rs is required to grow with 
SNR at slower rate than p'^. For that reason, we select the search radius to be Rs = \J MT(\ + Q log p), where C > 



"The radius of the sphere with volume equal to Vo(MG), i.e., r^ff (MG) = [l/(Vo(MG))/l/(5J'(l))]^'' 



(asymptotically less than , for all C > 0) and show that for sufficiently large C, such radius selection guarantees (with 
high probability) the existing of at least one lattice point inside the sphere. This can be seen from the fact that 



Pr(||ef > MT(1 + Clogp)) < p'^^^^, 



(18) 



where the inequality follows from applying Chemoff bound. Now, for sufficiently large C,, the above probability becomes 
negligible. In other words, asymptotically, one can expect that the received signal is highly likely to be located inside a 
sphere of square radius B?^ = MT{\ + ^logp). 

Next, we consider bounding the first term in the RHS of (16 1 from above. By viewing the decoder as a search on a 
tree one can interpret C as the total number of nodes that have been visited by the decoder. Therefore, assuming the 
all-zero codeword was transmitted, one can rewrite C as C = m + C, where 

m 

fc=i2jez'=\{0} 

Now, let (t>k{z) be the indicator function defined by 

'C' if lle-MxlP < i?2; 



0, otherwise. 



where is as defined in (12i. Then, one can easily verify that 

m 

k=l xeA* 

where A* = Ac\{0}. For a given lattice Ac, using Markov inequality, we have 

Ee'{C\Ac} 



Pr(C > L\Ac) = Pr(C >L- m\Ac) < 



L — m 



(19) 



for L > m. Taking the expectation of C with respect to the noise, one can easily show thaj^ 

Em f-ii 
k=l \ " \jr„\\2 



Pr(C > L, ||er < Ri\Ac,0) < 



L — m 



|e-Mx|r < Rs, ||e| 



<Ri\0) 



< Lk^^ y PrdlMxf < 4i?2|0) = ^fe=i ^^ Em I y HWMxf < AR^} 
L — m ^-^ L — m ^-^ 



(20) 



O 



where (a) follows from the fact that in general one can show that for any random vectors u and v, and Rg > 0, it holds 
{||« - < Rl, < Rl] C {||t;||2 < ARl], and 1{^} denotes the indicator function of the event A. By taking the 



expectation of (20l over the ensemble of random lattices (see 1 1711 . Theorem 4) 



Pr(C > L, \\ef < Rl\0) < 



Em 
k=l I 

L — m 



-M 



V{SS'{2Rs)) 



ycdet(MTM)V2 



} = Em {/ 



-T[E*ii(i-°.)+- 



\o 



}■ 



for L > m + V{S^{2Rs)) YlT=i ^'k- The last inequality follows from the fact that at high SNR we have Vc 



P 



and det (M^M)! 



/2 



Following the footsteps of |[T4ll . averaging (21 1 over the channels in O set, we have 



Pr(C > L,\e\' < Ri) < _U{a)Vr{C > L,\e\^ < Ri\a) da < p 



(21) 

-rT 

(22) 



o 



At this point, we would like to remind the reader that for the case of MMSE-DFE lattice decoding, the additive noise vector is non-Gaussian 
for finite T. However, one can show (see 1141 and 1151 ) that for a well-constructed lattice the probability density function of the noise vector e, 
feiu) < Pmfeiy), where e ~ A/'(0, 0.57), and Pm is a constant (has no effect at high SNR). 



where /^(a) is the joint probability density function of a which, for all a £ O, is asymptotically given by (see flA]) 

M 
i=l 



Uia) = exp(- log(p) ^(2^ -l + \N-M\ 



and (iout(^) the outage SNR exponent that is given in ([S]). 

Similar to 1141 . one can show that the behaviour of the first term in the RHS of (15 1 at high SNR is also p~'^out('^)_ 
Therefore, we finally have 

Pr(C7 > L) < (23) 

under the condition that 



The above results reveal that, if the number of computations performed by the decoder exceeds 

1/2- 



then the complexity distribution of the sphere decoder is upper bounded by its outage probability. As a result, one may 
save on decoding complexity while still achieving near-outage performance by setting a time-out limit at the decoder 
so that when the computational complexity exceeds Lq the decoder terminates the search. Such time-out limit does not 
affect the optimal tradeoff achieved by the modified decoding scheme. To see this, suppose that the sphere decoder 
imposes a time-out limit so that the search is terminated once the number of computations reaches Lq, and hence the 
decoder declares an error. Let Es be the event that the decoder makes an erroneous detection when L < Lq (this event 
occurs when the received signal y ^ Vx{MG), assuming x was transmitted). In this case, the average error probability 
is given by 

Pe{p) = Pt{Es U{C> Lo}) < Pr{Es) + Pr(C > Lq) < p^'^-'^'^). (25) 

However, since Lq is random, it would be interesting to calculate the (minimum) average number of computations 
required by the decoder to terminate the search. 

V. Average Sphere Decoding Complexity 

It is to be expected that when the channel is ill-conditioned (i.e., in outage) the computational complexity becomes 
extremely large. Moreover, when the channel is in outage it is highly likely that the decoder performs an erroneous 
detection. However, when the channel is not in outage, there is still a non-zero probability that the number of computations 



will become large (see (21 1). As such, it is sometimes desirable to terminate the search even when the channel is not in 
outage, especially when the sphere decoder is used under practically relevant runtime constraints. Therefore, we would 
like to determine the minimum average number of computations that is required by the decoder to terminate the search 
and declare an error without affecting the achievable DMT. 
This can be expressed as 

Lout = E{Lo{H'' GO)}, (26) 
where Lq{H'^ G O) denotes the minimum number of computations performed by the decoder when the channel is not in 



outage which is given in (24i. Before we do that, we would like first to study the asymptotic (at high SNR) behaviour 
of Lq. As mentioned in Section I, due to its low encoding complexity and the ability to achieve the optimal DMT of the 
channel, we focus our analysis on nested LAST codes, specifically LAST codes that are generated using construction A 
which is described below (see [17|). 

We consider the Loeliger ensemble of mod-p lattices, where p is a prime. First, we generate the set of all lattices 
given by 



where p ^ oo, k — is a scaling coefficient chosen such that the fundamental volume Vf = /-2Mr^2MT-i _ -|^^ 
denotes the field of mod-p integers, and C C 'L^'^'^'^ is a linear code over Zp with generator matrix in systematic form 
[I P^]^ . We use a pair of self-similar lattices for nesting. We take the shaping lattice to be A<j = (pKp, where (p is 
chosen such that the covering radius is 1/2 in order to satisfy the input power constraint. Finally, the coding lattice is 
obtained as Ac = p"^/^*^ to satisfy the transmission rate constraint R{p) = r log p. Interestingly, one can construct a 
generator matrix of Ap as (see HI) 

I 



P pi 



(27) 



which has a lower triangular form. In this case, one can express the generator matrix of Ac as G = p~'^/'^^G' , where 
G' = Thanks to the lower triangular format of G. If M is an m x m arbitrary full-rank matrix, and G is an m x m 
lower triangular matrix, then one can easily show that 



det[(MG)fcfc] = det(Mfcfc)det(Gfcfc), 

where {MG)kk, Mkk, and Gkk, are the lower k x k part of MG, M, and G, respectively. 
Using the above result, one can express the determinant that appears in ( p4] ) as 

det{Rl,Rkk) = det(MT,Mfcfc)det(GT,Gfefc) = p-'-'/^^UetiMj.Mkk) detiG'lt,G' ^k) 
Let /ii < /i2 < • • • < ^fc be the ordered non-zero eigenvalues of Mjg^Mkk, for A; = 1, • • • ,m. Then, 

k 

det(MjfcMfcfc) = II II J. 



Note that for the special case when k = mwe have p2{j-i)T+i 



Denote a' 



log /Lij / log p. Using (|28|), one can asymptotically express Lq as 

m 

+ (logp)-/2J](log/.)'=/V'= 



Ln 
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Now, since Ck is non-decreasing in k, at the high SNR regime we have 

Lo = m + (logp)>^'", 

where 

M 



i=l 



The asymptotic average of Lq (averaged over channel statistics) when the channel is not in outage is given by 



E{Lo(ff= G O)} 



Lofa{a) da 



= m + (logpY 



exp log p 



m + (logp)"/)' 
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(29) 



/i2jT = l+pA,((-ff^)H/r=), for all j = 1, • • • , M. 



(30) 



(31) 



(32) 



da. 



where O = |a G Mf : - "0"^ > and 

M _l_ M 

- J](2i-l + iV-M)a,: 

i=l i=l 



l(r) = max 



(33) 



It is not so difficult to see that the optimal channel coefficients that maximize (33) are 

a* = 1, for z = 1, • • • , M — fe, 



and 



a*=0, for i = M- /c + !,••• ,M, 



i.e., the same o;* that achieves the optimal DMT of the channel. Substituting a* in (33 1, we get 



TriM - r) 

l{r) = ^ ^ -(^M-r){N-r), (34) 

for r = 0, 1, • • • ,M. In this case, the asymptotic minimum average computational complexity that is required by the 
decoder to achieve near-optimal performance, when the channel is not in outage, can be expressed as 

Lout = 2MT + (log (35) 

Some interesting remarks can be drawn from the above analysis. Consider the case of a MIMO system with arbitrary 
M, and N. Assuming the use of an optimal random nested LAST code of codeword length T > N + M — 1 and fixed 
rate R, i.e., r = 0. In this case, one can see that /(O) < irrespective to the value of T (i.e., the average complexity 
is asymptotically bounded for all T). It is clear that the term (log p)2A/T^-AfA/ decays quickly to as /) — oo. The 
simulation results (introduced next) agree with the above analysis (see Fig. 1). 

It is clear from the above analysis that, for a given multiplexing gain < r < M, the sphere decoder has much lower 
asymptotic average computational complexity than the exhaustive ML decoder, where the latter has decoding complexity 
given by 2^^ = p''^ . However, there exists a cut-off multiplexing gain, say ro, such that the average computational 
complexity of the sphere decoder remains bounded. This should not be a surprising result, since the sphere decoder is 
simply viewed as a search in a tree. It is well-known that the ultimate limit to tree search decoding, such as sequential 
decoding [4|, is the computational cut-off rate, a rate above which the average number of visited nodes in the tree (i.e., 
computations) is unbounded. Sphere decoding algorithms are no exception. Instead of the cut-off rate we define a cut-off 



multiplexing gain for the outage-limited MIMO channel. This value can be easily found by setting /(ro) = in (35 1, 
which results in 

MN 



M + T 



Interestingly, if we let the number of receive antennas N ^ oo, then (assuming T = N -\- M — 1) one can achieve 
a cut-off multiplexing gain ro = M which is the maximum multiplexing gain achieved by the channel. This shows 
that one can dramatically improve the computational complexity of the decoder by increasing the number of antennas 
at the receiver side. Unfortunately, it is impossible for the sphere decoder to maintain very low decoding complexity 
while maintaining the maximal diversity (or the optimal tradeoff) that can be achieved by the channel, especially for the 
case of nested LAST codes discussed previously. For the case of MMSE-DFE sphere decoding, achieving the maximum 
diversity MN requires the use of LAST codes with codeword lengths T > N -\- AI — 1. Increasing the number of receive 
antennas N requires increasing T as well, and hence, the second term in ( [35] ) does not decay very quickly to zero. It 
turns out that the sphere decoder may achieve linear computational complexity m = 2MT (the signal dimension) at 
high SNR for large enough number of antennas N and fixed T, however at the expense of losing the maximum diversity 
MN (or losing the optimal tradeoff). 



A. Sphere vs. Sequential Decoding 

A more efficient decoder that is capable of achieving good error performance with much lower decoding complexity 
compared to the sphere decoder is the so-called lattice sequential decoder |4j|,|i5J. Such a decoder, inspired by the 
conventional sequential decoding algorithms such as the Fano and the Stack algorithms, provides excellent performance- 
complexity tradeoffs through the use of a decoding parameter called the bias. It has been shown in |5| that for a small 
fixed bias the average decoding complexity of the MMSE-DFE lattice sequential decoder is given by 

^sequential = 2MT + (log pf'^p'^'^ , (36) 



where l{r) is as defined in (34 1. For a fixed rate R, i.e., for r = 0, the ratio of the average complexity of both decoders, 
say 7, is given by 

Lsphere ^ 2MT + {log p)^''^ / p''^ 
^ ^sequential 2MT + (log p)^^ / p^^ ' 

It is clear from the above ratio that sequential decoding saves on average computational complexity at high SNR, 
especially for large signal dimensions. For example, consider the case of a 3 x 3 LAST coded MIMO system with T = 5 
and fixed rate. At p = 10^ (30 dB), we have 7 31, i.e., the sphere decoder's complexity is about 31 times larger than 
the complexity of the lattice sequential decoder. As will be shown in the sequel, simulation results agree with the above 
theoretical analysis. For p < 30 dB, one would expect the ratio 7 » 31. For extremely high SNR values (e.g., p » 30 
dB), it seems that 7 — )• 1 as p — > 00. However, as will be shown next, the reduction in the computational complexity of 
the sequential decoder comes at the price of some performance loss compared to the sphere decoder. The performance 
loss increases as the codeword length T increases. Hence, there is a tradeoff. 

VI. Simulation Results 

A. Tail Distribution & Average Complexity 

The average computational complexity for the MMSE-DFE sphere decoder is plotted in Fig. [1] for the case of LAST 
coded MIMO system with M = N = 2, and R = A bits per channel use (bpcu), for different values of codeword lengths 
T = 3, 4, 5. The figure shows how the average number of computations decays very quickly to m = 2MT (the signal 
dimension) at high SNR, even for large values of T. Fig. 2 demonstrates the fact that the MMSE-DFE sphere decoder 
has a cut-off rate such that the average complexity of the decoder remains bounded as long as we operate below it. The 
figure shows that for fixed M, N, and T, if we increase the rate, the average complexity increases as well and may 
become unbounded even at high SNR. 

B. Performance-vs-Complexity 

An example of the performance-complexity tradeoff that results in using the lattice sequential decoder instead of the 
sphere decoder is depicted in Fig. |3] Here, we consider a nested LAST coded 3x3 MIMO channel with T = 5 and 
i? = 4 bits per channel use. One can notice the amount of computations saved by the lattice sequential decoder for 
all values of SNR. For example, at p = 30 dB, the average complexity of the sphere decoder is about 30 times the 
complexity of the lattice sequential decoder for an optimal LAST coded MIMO system with dimension m = 30. This 
is achieved at the expense of some loss in performance (~0.6 dB). 

Finally, in Fig. [4j we compare the error performance of several lattice decoders that achieve the optimal tradeoff of 
the channel including, the MMSE-DFE sphere decoder, the MMSE-DFE sequential decoder, and the lattice-reduction 
aided MMSE-DFE decoder for a 3 x 3 LAST coded MIMO channel with T = 5, and rate R=12 bpcu. It is clear from 
the figure that, although all of the these decoders are capable of achieving the maximal diversity 9, lattice-reduction 
MMSE-DFE decoder encounters a big loss in performance (~ 2.7 dB) compared to the MMSE-DFE sphere decoder. 
For larger values of T, the SNR gap between the those decoders will become wider. Therefore, in order to achieve very 
close to the outage performance, one need to resort to a more reliable decoders such as the sphere decoder. 



VII. Summary 



In this paper, we have provided a complete analysis for the computational complexity of the MMSE-DFE sphere 
decoder appUed to the LAST coded MIMO channel, at the high SNR regime. An upper bound of the asymptotic 
complexity distribution has been derived. It has been shown that if the number of computations performed by the decoder 
exceeds a certain limit, the complexity's tail distribution becomes upper bounded by the asymptotic outage probability 
achieved by the LAST coding and sphere decoding schemes. As a result, the tradeoff of the MIMO channel is naturally 
extended to include the decoder's complexity. The average number of computations that is required to terminate the 
search when the channel is not in outage has been calculated in terms of the system parameters (p,M,N,T). In order 
to achieve high order diversity, the number of antennas and the codeword length must be increased simultaneously, 
causing the complexity of the decoding to increase. As expected, MMSE-DFE preprocessing significantly improves the 
overall computational complexity of the underlying decoding scheme. However, it has been shown that there exists a cut- 
off multiplexing gain for which the average complexity remains bounded. Finally, the performance-complexity tradeoff 
achieved by several decoders that perform lattice decoding have been discussed. 
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Fig. 1. The reduction in computational complexity achieved by the MMSE-DFE lattice decoder for all values of T that achieve maximum diversity 
4. All curves decays quickly to m = 2MT = AT at high SNR. 




Fig. 2. Plots of the average complexity of the MMSE-DFE sphere decoder for an optimal nested LAST coded 2x2 MIMO system with different 
rates R in bpcu. 
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Fig. 3. (o) Performance and (6) average computational complexity comparison between sphere decoding and lattice sequential decoding for a 
signal with dimension m = 30. 




Fig. 4. Performance comparison between several lattice decoders that achieve the optimal DMT of the channel for a LAST coded MIMO system 
with dimension m = 30. 



